Goto

Collaborating Authors

 loss coefficient


Supplementary Material -- Towards Reliable Model Selection for Unsupervised Domain Adaptation: An Empirical Study and A Certified Baseline

Neural Information Processing Systems

We first prove the first inequality using Jensen's inequality, which states that for a real-valued, convex Next, we leverage the property of inequalities to prove the second inequality. However, this method has limited effectiveness in scenarios with severe domain shifts between the source and target domains. Directly taking source risk as target risk is unreliable due to domain distribution shifts between domains. This work was completed while Dapeng ( lhxxhb15@gmail.com) Subsequently, Reverse V alidation performs a reversed adaptation from the pseudo-labeled target to the source and utilizes the source risk in this reversed adaptation task for validation.




A Additional prompt data details

Neural Information Processing Systems

Desination will be a red barn on the right 1. Continued on next page 18 Use Case Example rewrite Rewrite the following text to be more light-hearted: -- {very formal text} -- chat The following is a conversation with an AI assistant.



Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Wei, Tianwen, Zhu, Bo, Zhao, Liang, Cheng, Cheng, Li, Biye, Lü, Weiwei, Cheng, Peng, Zhang, Jianhao, Zhang, Xiaoyu, Zeng, Liang, Wang, Xiaokun, Ma, Yutuan, Hu, Rui, Yan, Shuicheng, Fang, Han, Zhou, Yahui

arXiv.org Artificial Intelligence

In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.


Explaining Veracity Predictions with Evidence Summarization: A Multi-Task Model Approach

Cekinel, Recep Firat, Karagoz, Pinar

arXiv.org Artificial Intelligence

The rapid dissemination of misinformation through social media increased the importance of automated fact-checking. Furthermore, studies on what deep neural models pay attention to when making predictions have increased in recent years. While significant progress has been made in this field, it has not yet reached a level of reasoning comparable to human reasoning. To address these gaps, we propose a multi-task explainable neural model for misinformation detection. Specifically, this work formulates an explanation generation process of the model's veracity prediction as a text summarization problem. Additionally, the performance of the proposed model is discussed on publicly available datasets and the findings are evaluated with related studies.


Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies

Lakkapragada, Anish, Sleiman, Essam, Surabhi, Saimourya, Wall, Dennis P.

arXiv.org Artificial Intelligence

Multi-Task Learning (MTL) is a growing subject of interest in deep learning, due to its ability to train models more efficiently on multiple tasks compared to using a group of conventional single-task models. However, MTL can be impractical as certain tasks can dominate training and hurt performance in others, thus making some tasks perform better in a single-task model compared to a multi-task one. Such problems are broadly classified as negative transfer, and many prior approaches in the literature have been made to mitigate these issues. One such current approach to alleviate negative transfer is to weight each of the losses so that they are on the same scale. Whereas current loss balancing approaches rely on either optimization or complex numerical analysis, none directly scale the losses based on their observed magnitudes. We propose multiple techniques for loss balancing based on scaling by the exponential moving average and benchmark them against current best-performing methods on three established datasets. On these datasets, they achieve comparable, if not higher, performance compared to current best-performing methods.


Building Intelligent Autonomous Navigation Agents

Chaplot, Devendra Singh

arXiv.org Artificial Intelligence

Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.


Miej/Dynamic_Neural_Manifold

#artificialintelligence

In this project, I've built a neural network architecture with a static execution graph that acts as a dynamic neural network in which connections between various neurons are controlled by the network itself. This is accomplished by manipulating the adjacency matrix representation of the network on a per-neuron basis with cell elements representing a'distance', and masking off connections that are within a threshold. Including a loss term based on the networks sparsity or processing time allows the architecture to optimize its structure for accuracy or speed. Alright, so hopefully I've caught your attention with the title. To begin, I'd like to explain a little behind why I've created this. My educational background is actually in the sciences, just at the junction between chemistry and physics.